29 research outputs found

    Efficient image copy detection using multi-scale fingerprints

    Get PDF
    Inspired by multi-resolution histogram, we propose a multi-scale SIFT descriptor to improve the discriminability. A series of SIFT descriptions with different scale are first acquired by varying the actual size of each spatial bin. Then principle component analysis (PCA) is employed to reduce them to low dimensional vectors, which are further combined into one 128-dimension multi-scale SIFT description. Next, an entropy maximization based binarization is employed to encode the descriptions into binary codes called fingerprints for indexing the local features. Furthermore, an efficient search architecture consisting of lookup tables and inverted image ID list is designed to improve the query speed. Since the fingerprint building is of low-complexity, this method is very efficient and scalable to very large databases. In addition, the multi-scale fingerprints are very discriminative such that the copies can be effectively distinguished from similar objects, which leads to an improved performance in the detection of copies. The experimental evaluation shows that our approach outperforms the state of the art methods.Inspired by multi-resolution histogram, we propose a multi-scale SIFT descriptor to improve the discriminability. A series of SIFT descriptions with different scale are first acquired by varying the actual size of each spatial bin. Then principle component analysis (PCA) is employed to reduce them to low dimensional vectors, which are further combined into one 128-dimension multi-scale SIFT description. Next, an entropy maximization based binarization is employed to encode the descriptions into binary codes called fingerprints for indexing the local features. Furthermore, an efficient search architecture consisting of lookup tables and inverted image ID list is designed to improve the query speed. Since the fingerprint building is of low-complexity, this method is very efficient and scalable to very large databases. In addition, the multi-scale fingerprints are very discriminative such that the copies can be effectively distinguished from similar objects, which leads to an improved performance in the detection of copies. The experimental evaluation shows that our approach outperforms the state of the art methods

    Deep Self-Taught Hashing for Image Retrieval

    Get PDF
    Hashing algorithm has been widely used to speed up image retrieval due to its compact binary code and fast distance calculation. The combination with deep learning boosts the performance of hashing by learning accurate representations and complicated hashing functions. So far, the most striking success in deep hashing have mostly involved discriminative models, which require labels. To apply deep hashing on datasets without labels, we propose a deep self-taught hashing algorithm (DSTH), which generates a set of pseudo labels by analyzing the data itself, and then learns the hash functions for novel data using discriminative deep models. Furthermore, we generalize DSTH to support both supervised and unsupervised cases by adaptively incorporating label information. We use two different deep learning framework to train the hash functions to deal with out-of-sample problem and reduce the time complexity without loss of accuracy. We have conducted extensive experiments to investigate different settings of DSTH, and compared it with state-of-the-art counterparts in six publicly available datasets. The experimental results show that DSTH outperforms the others in all datasets

    Efficient Image Copy Detection Using Multiscale Fingerprints

    No full text

    Graph-without-cut: an ideal graph learning for image segmentation

    No full text
    Graph-based image segmentation organizes the image elements into graphs and partitions an image based on the graph. It has been widely used and many promising results are obtained. Since the segmentation performance highly depends on the graph, most of existing methods focus on obtaining a precise similarity graph or on designing efficient cutting/merging strategies. However, these two components are often conducted in two separated steps, and thus the obtained graph similarity may not be the optimal one for segmentation and this may lead to suboptimal results. In this paper, we propose a novel framework, Graph-WithoutCut (GWC), for learning the similarity graph and image segmentations simultaneously. GWC learns the similarity graph by assigning adaptive and optimal neighbors to each vertex based on the spatial and visual information. Meanwhile, the new rank constraint is imposed to the Laplacian matrix of the similarity graph, such that the connected components in the resulted similarity graph are exactly equal to the region number. Extensive empirical results on three public data sets (i.e, BSDS300, BSDS500 and MSRC) show that our unsupervised GWC achieves state-of-the art performance compared with supervised and unsupervised image segmentation approaches

    Type 2 Diabetes Biomarkers of Human Gut Microbiota Selected via Iterative Sure Independent Screening Method

    No full text
    <div><p>Type 2 diabetes, which is a complex metabolic disease influenced by genetic and environment, has become a worldwide problem. Previous published results focused on genetic components through genome-wide association studies that just interpret this disease to some extent. Recently, two research groups published metagenome-wide association studies (MGWAS) result that found meta-biomarkers related with type 2 diabetes. However, One key problem of analyzing genomic data is that how to deal with the ultra-high dimensionality of features. From a statistical viewpoint it is challenging to filter true factors in high dimensional data. Various methods and techniques have been proposed on this issue, which can only achieve limited prediction performance and poor interpretability. New statistical procedure with higher performance and clear interpretability is appealing in analyzing high dimensional data. To address this problem, we apply an excellent statistical variable selection procedure called iterative sure independence screening to gene profiles that obtained from metagenome sequencing, and 48/24 meta-markers were selected in Chinese/European cohorts as predictors with 0.97/0.99 accuracy in AUC (area under the curve), which showed a better performance than other model selection methods, respectively. These results demonstrate the power and utility of data mining technologies within the large-scale and ultra-high dimensional genomic-related dataset for diagnostic and predictive markers identifying.</p></div

    Metagenomics Biomarkers Selected for Prediction of Three Different Diseases in Chinese Population

    No full text
    The dysbiosis of human microbiome has been proven to be associated with the development of many human diseases. Metagenome sequencing emerges as a powerful tool to investigate the effects of microbiome on diseases. Identification of human gut microbiome markers associated with abnormal phenotypes may facilitate feature selection for multiclass classification. Compared with binary classifiers, multiclass classification models deploy more complex discriminative patterns. Here, we developed a pipeline to address the challenging characterization of multilabel samples. In this study, a total of 300 biomarkers were selected from the microbiome of 806 Chinese individuals (383 controls, 170 with type 2 diabetes, 130 with rheumatoid arthritis, and 123 with liver cirrhosis), and then logistic regression prediction algorithm was applied to those markers as the model intrinsic features. The estimated model produced an F1 score of 0.9142, which was better than other popular classification methods, and an average receiver operating characteristic (ROC) of 0.9475 showed a significant correlation between these selected biomarkers from microbiome and corresponding phenotypes. The results from this study indicate that machine learning is a vital tool in data mining from microbiome in order to identify disease-related biomarkers, which may contribute to the application of microbiome-based precision medicine in the future

    Averaged AUC obtained from SVM classifier combined with three variable selection methods.

    No full text
    <p>SVM classifier estimated as a function of sample size in a 50 × 10-fold cross-validation setting. We show accuracy of 60-gene of ensemble feature selection and 48-gene of ISIS-SCAD on Chinese dataset. For European dataset, the accuracy of ensemble feature selection is computed on 60-gene and the accuracy of ISIS-SCAD is on 24-gene.</p
    corecore